46 research outputs found

    Variable Selection Techniques for Clustering on the Unit Hypersphere

    Get PDF
    Mixtures of von Mises-Fisher distributions have been shown to be an effective model for clustering data on a unit hypersphere, but variable selection for these models remains an important and challenging problem. In this paper, we derive two variants of the expectation-maximization framework, which are each used to identify a specific type of irrelevant variables for these models. The first type are noise variables, which are not useful for separating any pairs of clusters. The second type are redundant variables, which may be useful for separating pairs of clusters, but do not enable any additional separation beyond the separability provided by some other variables. Removing these irrelevant variables is shown to improve cluster quality in simulated as well as benchmark datasets

    Confidence Intervals for Prevalence Estimates from Complex Surveys with Imperfect Assays

    Full text link
    We present several related methods for creating confidence intervals to assess disease prevalence in variety of survey sampling settings. These include simple random samples with imperfect tests, weighted sampling with perfect tests, and weighted sampling with imperfect tests, with the first two settings considered special cases of the third. Our methods use survey results and measurements of test sensitivity and specificity to construct melded confidence intervals. We demonstrate that our methods appear to guarantee coverage in simulated settings, while competing methods are shown to achieve much lower than nominal coverage. We apply our method to a seroprevalence survey of SARS-CoV-2 in undiagnosed adults in the United States between May and July 2020.Comment: 45 pages, 35 figure

    Semi-parametric modeling of SARS-CoV-2 transmission in Orange County, California using tests, cases, deaths, and seroprevalence data

    Full text link
    Mechanistic modeling of SARS-CoV-2 transmission dynamics and frequently estimating model parameters using streaming surveillance data are important components of the pandemic response toolbox. However, transmission model parameter estimation can be imprecise, and sometimes even impossible, because surveillance data are noisy and not informative about all aspects of the mechanistic model. To partially overcome this obstacle, we propose a Bayesian modeling framework that integrates multiple surveillance data streams. Our model uses both SARS-CoV-2 diagnostics test and mortality time series to estimate our model parameters, while also explicitly integrating seroprevalence data from cross-sectional studies. Importantly, our data generating model for incidence data takes into account changes in the total number of tests performed. We model transmission rate, infection-to-fatality ratio, and a parameter controlling a functional relationship between the true case incidence and the fraction of positive tests as time-varying quantities and estimate changes of these parameters nonparameterically. We apply our Bayesian data integration method to COVID-19 surveillance data collected in Orange County, California between March, 2020 and March, 2021 and find that 33-62% of the Orange County residents experienced SARS-CoV-2 infection by the end of February, 2021. Despite this high number of infections, our results show that the abrupt end of the winter surge in January, 2021, was due to both behavioral changes and a high level of accumulated natural immunity.Comment: 37 pages, 16 pages of main text, including 5 figures, 1 tabl

    Lectin-Dependent Enhancement of Ebola Virus Infection via Soluble and Transmembrane C-type Lectin Receptors

    Get PDF
    Mannose-binding lectin (MBL) is a key soluble effector of the innate immune system that recognizes pathogen-specific surface glycans. Surprisingly, low-producing MBL genetic variants that may predispose children and immunocompromised individuals to infectious diseases are more common than would be expected in human populations. Since certain immune defense molecules, such as immunoglobulins, can be exploited by invasive pathogens, we hypothesized that MBL might also enhance infections in some circumstances. Consequently, the low and intermediate MBL levels commonly found in human populations might be the result of balancing selection. Using model infection systems with pseudotyped and authentic glycosylated viruses, we demonstrated that MBL indeed enhances infection of Ebola, Hendra, Nipah and West Nile viruses in low complement conditions. Mechanistic studies with Ebola virus (EBOV) glycoprotein pseudotyped lentiviruses confirmed that MBL binds to N-linked glycan epitopes on viral surfaces in a specific manner via the MBL carbohydrate recognition domain, which is necessary for enhanced infection. MBL mediates lipid-raft-dependent macropinocytosis of EBOV via a pathway that appears to require less actin or early endosomal processing compared with the filovirus canonical endocytic pathway. Using a validated RNA interference screen, we identified C1QBP (gC1qR) as a candidate surface receptor that mediates MBL-dependent enhancement of EBOV infection. We also identified dectin-2 (CLEC6A) as a potentially novel candidate attachment factor for EBOV. Our findings support the concept of an innate immune haplotype that represents critical interactions between MBL and complement component C4 genes and that may modify susceptibility or resistance to certain glycosylated pathogens. Therefore, higher levels of native or exogenous MBL could be deleterious in the setting of relative hypocomplementemia which can occur genetically or because of immunodepletion during active infections. Our findings confirm our hypothesis that the pressure of infectious diseases may have contributed in part to evolutionary selection of MBL mutant haplotypes

    Closure and the Book of Virgil

    Get PDF

    Medulloblastoma in childhood: revisiting intrathecal therapy in infants and children

    Full text link
    corecore